{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_multi_instance.ipynb)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accelerate PyTorch Training using Multiple Instances"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`TorchNano` (`bigdl.nano.pytorch.TorchNano`) supports multi-instance training that can make full usage of hardwares with multiple CPU cores or sockets (especially when the number of cores is large). Here we provide __2__ ways to achieve this: A) subclass `TorchNano` or B) use `@nano` decorator. You can choose the appropriate one depending on your (preferred) code structure."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "## Prepare Environment for BigDL-Nano"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "At first, you need to install BigDL-Nano for PyTorch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "!pip install --pre --upgrade bigdl-nano[pytorch] # install the nightly-built version\n",
    "!source bigdl-nano-init # set environment variables"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    ">\n",
    "> Before starting your PyTorch application, it is highly recommended to run `source bigdl-nano-init` to set several environment variables based on your current hardware. Empirically, these variables will greatly improve performance for most PyTorch applications on training workloads."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "> ⚠️ **Warning**\n",
    "> \n",
    "> For Jupyter Notebook users, we recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "## Pre-define Model and Dataloader"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "In this guide, we take the fine-tuning of a [ResNet-18 model](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) on [OxfordIIITPet dataset](https://pytorch.org/vision/main/generated/torchvision.datasets.OxfordIIITPet.html) as an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# Define model and dataloader\n",
    "\n",
    "from torch import nn\n",
    "from torchvision.models import resnet18\n",
    "\n",
    "class MyPytorchModule(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.model = resnet18(pretrained=True)\n",
    "        num_ftrs = self.model.fc.in_features\n",
    "        # Here the size of each output sample is set to 37.\n",
    "        self.model.fc = nn.Linear(num_ftrs, 37)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return self.model(x)\n",
    "\n",
    "\n",
    "import torch\n",
    "from torchvision import transforms\n",
    "from torchvision.datasets import OxfordIIITPet\n",
    "from torch.utils.data.dataloader import DataLoader\n",
    "\n",
    "def create_train_dataloader():\n",
    "    train_transform = transforms.Compose([transforms.Resize(256),\n",
    "                                          transforms.RandomCrop(224),\n",
    "                                          transforms.RandomHorizontalFlip(),\n",
    "                                          transforms.ColorJitter(brightness=.5, hue=.3),\n",
    "                                          transforms.ToTensor(),\n",
    "                                          transforms.Normalize([0.485, 0.456, 0.406],\n",
    "                                                               [0.229, 0.224, 0.225])])\n",
    "\n",
    "    # apply data augmentation to the train_dataset\n",
    "    train_dataset = OxfordIIITPet(root=\"/tmp/data\", transform=train_transform, download=True)\n",
    "\n",
    "    # prepare data loader\n",
    "    train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n",
    "\n",
    "    return train_dataloader"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A) Subclass `TorchNano`"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, two steps are required if you choose to subclass `TorchNano`:\n",
    "\n",
    "1) import and subclass `TorchNano`, and override its `train()` method\n",
    "2) instantiate it with setting `num_processes` , then call the `train()` method\n",
    "\n",
    "For step 1, you can refer to [this page](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano.html) to achieve it (for consistency, we use the same model and dataset as an example). Supposing that you've already got a well-defined subclass `MyNano`, below line will instantiate it and train your model with 2 processes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "from tqdm import tqdm\n",
    "from bigdl.nano.pytorch import TorchNano # import TorchNano\n",
    "\n",
    "# subclass TorchNano and override its train method\n",
    "class MyNano(TorchNano):\n",
    "    def train(self):\n",
    "        # Move the code for your custom training loops inside the train method\n",
    "        model = MyPytorchModule()\n",
    "        optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\n",
    "        loss_fuc = torch.nn.CrossEntropyLoss()\n",
    "        train_loader = create_train_dataloader()\n",
    "\n",
    "        # call setup method to set up model, optimizer(s),\n",
    "        # and dataloader(s) for accelerated training\n",
    "        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)\n",
    "        num_epochs = 5\n",
    "\n",
    "        for epoch in range(num_epochs):\n",
    "\n",
    "            model.train()\n",
    "            train_loss, num = 0, 0\n",
    "            with tqdm(train_loader, unit=\"batch\") as tepoch:\n",
    "                for data, target in tepoch:\n",
    "                    tepoch.set_description(f\"Epoch {epoch}\")\n",
    "                    optimizer.zero_grad()\n",
    "                    output = model(data)\n",
    "                    loss = loss_fuc(output, target)\n",
    "                    # Replace loss.backward() with self.backward(loss)\n",
    "                    self.backward(loss)\n",
    "                    optimizer.step()\n",
    "                    loss_value = loss.sum()\n",
    "                    train_loss += loss_value\n",
    "                    num += 1\n",
    "                    tepoch.set_postfix(loss=loss_value)\n",
    "            print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MyNano(num_processes=2).train()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _The detailed definition of_ `MyNano` _can be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_multi_instance.ipynb)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    ">\n",
    "> By setting `num_processes`, CPU cores will be automatically and evenly distributed among specific number of processes, to avoid conflicts and maximize training throughput. If you would like to specify the CPU cores used by each process, You could set `cpu_for_each_process` to a list of length `num_processes`, in which each item is a list of CPU indices.\n",
    "> \n",
    "> Currently, `‘subprocess’` (default), `‘spawn’` and `‘ray’` are supported as `distributed_backend` for `TorchNano`.\n",
    "> \n",
    "> Also note that, when using data-parallel training, the batch size is equivalent to becoming `num_processes` times larger. The learning rate warm-up strategy that gradually increases the learning rate to `num_processes` times is a compensate to achieve the same effect as single instance training. Nano enables this strategy by default through `auto_lr=True`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## B) Use `@nano` decorator"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`@nano` decorator is very friendly since you can only add 2 new lines (import it and wrap the training function) and enjoy the features brought by BigDL-Nano if you have already defined a PyTorch training function with a model, optimizers, and dataloaders as parameters. You can learn the usage and notes of it from [here](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training.html). The only difference when using multi-instance training is that you should specify the decorator as `@nano(num_processes=n)` with _n_ being the expected number of processes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm import tqdm\n",
    "from bigdl.nano.pytorch import nano # import nano decorator\n",
    "\n",
    "@nano(num_processes=2) # apply the decorator to the training loop\n",
    "def training_loop(model, optimizer, train_loader, num_epochs, loss_func):\n",
    "\n",
    "    for epoch in range(num_epochs):\n",
    "\n",
    "        model.train()\n",
    "        train_loss, num = 0, 0\n",
    "        with tqdm(train_loader, unit=\"batch\") as tepoch:\n",
    "            for data, target in tepoch:\n",
    "                tepoch.set_description(f\"Epoch {epoch}\")\n",
    "                optimizer.zero_grad()\n",
    "                output = model(data)\n",
    "                loss = loss_func(output, target)\n",
    "                loss.backward()\n",
    "                optimizer.step()\n",
    "                loss_value = loss.sum()\n",
    "                train_loss += loss_value\n",
    "                num += 1\n",
    "                tepoch.set_postfix(loss=loss_value)\n",
    "            print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "model = MyPytorchModule()\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\n",
    "loss_func = torch.nn.CrossEntropyLoss()\n",
    "train_loader = create_train_dataloader()\n",
    "\n",
    "training_loop(model, optimizer, train_loader, num_epochs=5, loss_func=loss_func)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _A runnable example including this_ `training_loop` _can be seen from_ [here](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_multi_instance.ipynb)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    "> \n",
    "> By setting `num_processes`, CPU cores will be automatically and evenly distributed among specific number of processes, to avoid conflicts and maximize training throughput. If you would like to specify the CPU cores used by each process, You could set `cpu_for_each_process` to a list of length `num_processes`, in which each item is a list of CPU indices.\n",
    "> \n",
    "> Currently, `‘subprocess’` (default), and `‘ray’` are supported as `distributed_backend` for `@nano` decorator (`'spawn'` is not supported by `@nano`).\n",
    "> \n",
    "> Also note that, when using data-parallel training, the batch size is equivalent to becoming `num_processes` times larger. The learning rate warm-up strategy that gradually increases the learning rate to `num_processes` times is a compensate to achieve the same effect as single instance training. Nano enables this strategy by default through `auto_lr=True`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📚 **Related Readings**\n",
    "> \n",
    "> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)\n",
    "> - [How to convert your PyTorch training loop to use TorchNano for acceleration](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano.html)\n",
    "> - [How to accelerate your PyTorch training loop with \\@nano decorator](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training.html)\n",
    "> - [How to choose the number of processes for multi-instance training](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/General/choose_num_processes_training.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0 (default, Jun 28 2018, 13:15:42) \n[GCC 7.2.0]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "8772eaeb16382a2d9dbb95ffcb3882976733f8dc8a0780f3e0ca9a3a7dc812c0"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
